Estimating the quality of data using provenance: a case study in eScience
نویسندگان
چکیده
Data quality assessment is a key factor in data-intensive domains. The data deluge is aggravated by an increasing need for interoperability and cooperation across groups and organizations. New alternatives must be found to select the data that best satisfy users’ needs in a given context. This paper presents a strategy to provide information to support the evaluation of the quality of data sets. This strategy is based on combining metadata on the provenance of a data set (derived from workflows that generate it) and quality dimensions defined by the set’s users, based on the desired context of use. Our solution, validated via a case study, takes advantage of a semantic model to preserve data provenance related to applications in a specific domain.
منابع مشابه
Managing the Deluge of Scientific Data
Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises e...
متن کاملProvenir ontology: Towards a Framework for eScience Provenance Management
Management Satya S. Sahoo, Amit P. Sheth Kno.e.sis center, Computer Science and Engineering Department, Wright State University, Dayton, OH-45324, USA {sahoo.2, amit.sheth}@wright.edu Abstract Provenance metadata describes the “lineage” or history of an entity and necessary information to verify the quality of data, validate experiment protocols, and associate trust value with scientific result...
متن کاملOntology-Driven Provenance Management in eScience: An Application in Parasite Research
Provenance, from the French word “provenir”, describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be ...
متن کاملUsing Provenance for Personalized Quality Ranking of Scientific Datasets
The rapid growth of eScience has led to an explosion in the creation and availability of scientific datasets that includes raw instrument data and derived datasets from model simulations. A large number of these datasets are surfacing online in public and private catalogs, often annotated with XML metadata, as part of community efforts to foster open research. With this rapid expansion comes th...
متن کاملPrOM: A Semantic Web Framework for Provenance Management in Science
The eScience paradigm is enabling researchers to collaborate over the Web in virtual laboratories and conduct experiments on an industrial scale. But, the inherent variability in the quality and trust associated with eScience resources necessitates the use of provenance information describing the origin of an entity. Existing systems often model provenance using ambiguous terminology, have poor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013